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TWO-STEP SPLINE ESTIMATING EQUATIONS FOR 
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WITH LARGE CLUSTER SIZES 

By Shujie Ma 

University of California, Riverside 

We propose a two-step estimating procedure for generalized ad- 
ditive partially linear models with clustered data using estimating 
equations. Our proposed method applies to the case that the number 
of observations per cluster is allowed to increase with the number of 
independent subjects. We establish oracle properties for the two-step 
estimator of each function component such that it performs as well 
as the univariate function estimator by assuming that the parametric 
vector and all other function components are known. Asymptotic dis- 
tributions and consistency properties of the estimators are obtained. 
Finite-sample experiments with both simulated continuous and bi- 
nary response variables confirm the asymptotic results. We illustrate 
the methods with an application to a U.S. unemployment data set. 

1. Introduction. The generalized estimating equations (GEE) approach 
has been widely applied to the analysis of clustered data. Reference [15] 
introduced the GEEs to estimate the regression parameters of generalized 
linear models with possible unknown correlations between responses. The 
GEE approach only requires the first two marginal moments and a working 
correlation matrix that accounts for the form of within-subject correlations 
of responses, and it can yield consistent parameter estimators even when the 
covariance structure is misspecified, as long as the mean function is correctly 
specified. 

Parametric GEEs enjoy simplicity by assuming a fully predetermined 
parametric form for the mean function, but they have suffered from inflex- 
ibility in modeling complicated relationships between the response and co- 
variates in clustered data studies. To allow for flexibility, [9, 32] and [16] pro- 
posed to model covariate effects nonparametrically via GEE. The proposed 
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nonparametric GEE method enables us to capture the underlying structure 
that otherwise can be missed. Reference [17] extended the kernel estimating 
equations in [16] to generalized partially linear models (GPLMs), which as- 
sume that the mean of the outcome variable depends on a vector of covari- 
ates parametrically and a scalar predictor nonparametrically to overcome 
the "curse of dimensionality" of nonparametric models. As an extension, [6] 
and [14] approximated the nonparametric function in GPLMs by regression 
splines. It is pointed out in [31] and [18] that splines effectively account for 
the correlations of clustered data and are more efficient in nonparametric 
models with longitudinal data than conventional local-polynomials. Splines 
also provide optimal convergence rates in partially linear models [7, 8]. To 
allow the nonparametric part in partially linear models to include multivari- 
ate covariates, [21] extended the estimating equations method to generalized 
additive partially linear models (GAPLMs) with an identity link for contin- 
uous response cases, and obtained estimators for the parametric vector and 
the nonparametric additive functions via a one-step spline estimation. 

To introduce GAPLMs for clustered data, denote {(Yij, Xjj, Zjj), 1 < i < 
n, 1 < j < rrii} as the jth repeated observation for the ith. subject or experi- 
mental unit, where Yij is the response variable, Xjj = (1, Xjji, . . . , 
and Zjj = {Ziji, . . . jZjj^j)'^ o?i -dimensional and (i2-dimensional vectors 
of covariates, respectively. The marginal model assumes that — fiij -t- Sij^ 
and the marginal mean Hij depends on Xjj and Zjj through a known mono- 
tonic and differentiable link function ■!?, so that the GAPLM is given as 

(1) r]ij = '&{nij)=X.Jj^ + ^ei{Ziji), j = l,...,mi,i = l,...,n, 

1=1 

where /3 is a di-dimensional regression parameter, and 6i, 1 = 1,... ,d2, are 
unknown but smooth functions. We assume £j = (eji) • ■ • j^imj'^ ~ (0, 
For identifiability, both the additive and linear components must be cen- 
tered, that is, EOi{Ziji) = 0, I = 1, . . . , (i2, EXiji^ = 0, k = 1, . . . ,di. Model 
(1) can either become a generalized additive model [5] if the parameter vec- 
tor /3 = or be a generalized linear model if 9i{-) = 0,1 < I < d2. Model (1) is 
more parsimonious and easier to interpret than purely generalized additive 
models by allowing a subset of predictors to be discrete and unbounded, 
modeled as some of the variables (-'^^jjfc)fcL7)^ ^.nd more flexible than gener- 
alized linear models by allowing nonlinear relationships. 

The GEE methods have been widely applied to analyze clustered data 
with small cluster sizes and a large number of subjects n. However, data 
with large cluster sizes have occurred frequently in various fields such as 
machine learning, pattern recognition, image analysis, information retrieval 
and bioinformatics. Reference [33] first studied the asymptotics for para- 
metric GEE estimators with large cluster sizes. As an extension, we develop 
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asymptotic properties of the spline GEE estimators in the GAPLMs (1) 
when the cluster sizes are allowed to increase with n, that is, the maximum 
cluster size m(„) = maxi<j<„mj is a function of n, such that m(„) — oo as 
n — oo. 

The one-step spline estimation in [21] for GAPLMs with identity link is 
fast to compute but lacks limiting distribution. The traditional backfitting 
approach has been widely used to estimate additive models for independent 
and identically distributed (i.i.d.) and weekly-dependent data [5, 23, 25]. It, 
however, has computational burden issues, due to its iterative nature. More- 
over, it is pointed out in [12] that derivation of the asymptotic properties of 
a backfitting estimator for a model with a link function is very complicated. 
As an alternative, [10, 12, 19] and [11] proposed two-stage kernel based 
estimators for i.i.d. data including one step backfitting of the integration es- 
timators in [19] and one step backfitting of the projection estimators in [10], 
one Newton step from the nonlinear least squares estimators in [12], and the 
extension of the method in [12] to additive quantile regression models. The 
two-stage estimator enjoys the oracle property which backfitting estimators 
do not have, that is, it performs as well as the univariate function estimator 
by assuming that other components are known. 

In this paper, we propose a two-step spline GEE approach to approximate 
6i{-) for 1 <l <d2 in model (1) with going to infinity or bounded, and 
establish oracle efficiency such that the two-step spline GEE estimator of 
6i{-) achieves the same asymptotic distribution of the oracle estimator ob- 
tained by assuming that (3 and other functions 6i'{-) for 1 < < c?2 and /' ^ I 
are known. In the first step, the additive components Oi'{-) for 1 <l' <d2 and 
/' 7^ I are pre-estimated by their pilot estimators through an undersmoothed 
spline procedure. In the second step, a more smoothed spline estimating pro- 
cedure is applied to the univariate data to estimate 9i{-) with asymptotic 
distribution established. The proposed two-step estimators achieve uniform 
oracle efficiency by "reducing bias via undersmoothing" in the first step 
and "averaging out the variance" in the second step. We establish asymp- 
totic consistency and normality of the one-step estimator for the parameter 
vector and the two-step estimators of the nonparametric components. The 
two-step spline GEE approach is inspired by the idea of "spline-backfitted 
kernel/spline smoothing" of [20, 26, 29] and [22] for additive models, additive 
coefficient models and additive partially linear models with i.i.d or weekly- 
dependent data by using least squares. The complex correlations within the 
clusters as well as the non-Gaussian nature of discrete data make the esti- 
mation and development of asymptotic properties in the framework studied 
in this paper much more challenging. 

2. Two-step spline estimating equations. For simplicity, we denote vec- 
tors Yi = {{Yii,..., Yim.J'^}m.^ X 1 and r? . = { {r]ii , . . . , r/j^, V}m,xi, 1 < m-j < 
1 <i <n. Let Sij = Yij - i^iij, and = (ea, • • • ,eimi)^- Similarly, 
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let Xj = {(Xji, . . . ,Ximj'^}m,xdi and Zj = {{Zn, Zim^)^}m,xd2- Assume 
that Ziji is distributed on a compact interval [0^,6;],! < / < d2, and, with- 
out loss of generality, we take all intervals [ai,bi] = [0, 1], 1 < / < ^2- We 
further let OiiZa) = {{Oi^Zm), . . . ,0i{Zimj)}'^}m,xi, for l = l,...,d2. The 
mean function in model (1) can be written in matrix notation as rj. = 

X.i(3 + Y^'l^^9i{Zii), which is the marginal model [15]. Let fi{-) = '&~^{-) be 
the inverse of the link function and /"(rj.) = [{/i(r?ji), • • • , Ai(r/imJ}^]miXi- 

As in [30], we allow Xi and Zj to be dependent. Let Vj = Vj(Xj, ZJ be 
the assumed "working" covariance of Yj, where Vj = A]^^^Rj(Q!)A|''^, Aj = 
Aj(Xj,Zj) denotes an irii x irii diagonal matrix that contains the marginal 
variances of Yij, and Rj is an invertible working correlation matrix, which 
depends on a nuisance parameter vector a. Let 51 j = I]j(Xj, ZJ be the true 
covariance of Yj. If Rj is equal to the true correlation matrix Rj, then 
Vj = S,. 

Following [29], we approximate the nonparametric functions 0fs by cen- 
tered polynomial splines. Let Gn be the space of polynomial splines of degree 
q>l. We introduce a knot sequence with A'^^ interior knots 

t-q = ■ ■ ■ = t-i = to = < ti < ■ ■ ■ < tN < ^ = tN+1 = • • • = iAT+q+l , 

where = Nn increases when the number of subjects n increases, with 
order assumption given in condition (A4). Then G„ consists of functions w 
satisfying the following: (i) is a polynomial of degree q on each of the 
subintervals Ig = [ts,ts+i), s = 0, . . . ,Nn - 1, Inu = [tN^A]; (h) for q>l, 
w \s q — 1 time continuously differentiable on [0, 1]. Let J„ = Nn -|- -|- 1. 
Let {bs^i '-i < I < d2,l < s < Jn + l}^ be a basis system of the space Gn- 
We adopt the centered B-spline space introduced in [34], where B(z) = 
{Bs^i{zi) : 1 < / < c?2, 1 < s < Jn}^ is a basis system of the space Gn with 
BsA^i) = VN;:[bs+i,i{zi) - {E{bs+i,i)/E{b,,i)}bi,i{zi)] and z = {zi)t^. 

Equally-spaced knots are used in this article for simplicity of proof. Other 
regular knot sequences can also be used, with similar asymptotic results. 

Step 1. Pilot estimators of /3 and 9i{-). Suppose that 61 can be approxi- 
mated well by a spline function in G^, so that 

(2) eiizi)^eiizi) = Y,%iBsAzi). 

s=l 

Let 7 = {jsi s = 1, . . . , Jn,l = 1, ■ ■ ■ , d2)'^ be the collection of the coeffi- 
cients in (2), and denote Bj^j = [{Bs,i{Ziji) : s = 1, . . . , J„}'^]j„ xi_and Bjj = 
{(Bj^i, • ■ • , B,^-^J'^}rf2 j„xi, then we have an approximation r/jj rjij = X^./3-F 
Bjj-7- We can also write the approximation in matrix notation as r].^rj.= 
Xi/3 + Bj7, where Bj = {(Bji, . . ■ ,Bi„,J^}„,^^d2J„- Let = [{fi{vii), 
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IJ'{r]im,)}^]miXl- Let 3„ = 0ri,l, ■ ■ ■ 3n,di)^ and 7n = {ln,sV-S = l,...,Jn,l = 

1, . . . , ^2}"'" be the minimizer of 

1 " 

(3) Q„(/3,7) = 2 E{^^ - + B,7)} Vr'(/9,7){Y, - /x(X,/3 + B,7)}, 

i=l 

which is corresponding to the class of working covariance matrices {Vj, 1 < 
i <n}. Then /3„ and 7„ solve the estimating equations 

n 

(4) g„(/3,7) = J]dTa,(/3,7)V-1(/3,7){Y, - K^^fS + B^)} = 0, 

i=l 

where = (Xi,Bj^^x(di+d2Jn)' and 

A, 7) = diag( Ail 7), • • • , Ai,„, (/3, 7)) 

is a diagonal matrix with the diagonal elements being the first derivative 
of /i(-) evaluated at rjij, j = 1,. . . ,mi. Then we let (3^ be the estimator of 
the parameter vector (3. For each 1 < I < d2, the pilot estimator of the Ith 
nonparametric function 6i{zi) is 9n,i{zi) = ^i=iln,siBs,i{zi). The one-step 
spline estimator of each function component has consistency properties, but 
lacks limiting distribution [21, 22, 29]. 

Step II. Two-step spline GEE estimator of 6i{-). Next, we propose a two- 
step spline estimator of 9i{-) for given 1 <l < d2. The basic idea is that for 
every 1<1 < d2, we estimate the /th function 9i{-) in model (1) nonparamet- 
rically with the GEE method by assuming that the parameter vector /3 and 
other nonparametric components 6_i = {Oi'{-) : 1 < /' < d2,l' ^ 1} are known. 
The problem turns into a univariate function estimation problem. Because 
the true parameter vector (3 and functions 6_i are not known in reality, we 
replace them by their pilot estimators from step I to obtain the two-step esti- 
mator of ^/(•). Both kernel and spline based methods can be employed in the 
second step to estimate 6i{-). Here we choose the spline method described 
in the beginning of this section. We use the splines of the same degree q 
as in step I. Denote Bg^ = [{Bfi{Ziji) : s = 1, . . . , J^}^]js^i, where Bfi{zi) 
is the spline function defined in the same way as Bs^i{zi) in step I, but 
with = the number of interior knots and let = + q + 1. De- 
note Bf{zi) = {Bfi{zi), s = I, . . . , J^}^ , Bfi = {{Bai,...,B,^jf}^^^js, 
and -yf = (7^/ : s = 1, . . . , Jjf By assuming that /3 and = {Oii{-) :l' y^l, 
1 < /' < ^2} are known, 9i{zi) is estimated by the oracle estimator 

s=l 
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with J^^i{f3j6^i) = {'^n si(f^-^^-0}s=i solving the estimating equation 

n 
i=l 

( / d2 \\ 



(6) 



T_5 

nif I £j^v I -t- I n ■ I I 



0, 



where Ai(/3, 6>_/,7f ) = diag(Aa(7?fi), . . . , Ai„,^(7?f^J), and Aij(77,^.) is the 

first derivative of evaluated at vif- = ^JjP + YfiLi^v^i + {^fji)^lf, 

J = 1, . . . , mj. We replace the true parameter vector (3 and the true functions 
O-i = < /' < d2,l' 7^ /} with the pilot estimators /3„ and 9n-i = 

{9n,i'{-)A <l'< d2,l' / /}, where 9n,i'{zu) = T,i=i^n,si'Bs,i'{zi'), so that 
6i{zi) is estimated by the two-step spline estimator 

o 

The Newton-Raphson algorithm of GEE is applied to obtain 7„ ; . Define 
P„(A7) = {-5g„(/3,7)/5(/3T,7^)}(,,+,,j„)x(d.+d.j„)> 



^n(/3,7) = 5^DTA,(/3,7)Vri(/3,7)A,(/3,7)D, 



{di+d2Jn)x{di+d2J„) 



3. Asymptotic properties of the estimators. For any s x s symmetric 
matrix A, denote by Amin(A) and Amax(A) its smallest and largest eigen- 
values. For any vector a = (ai, . . . ,as)'^, let its Euclidean norm be ||q!|| = 

y^a'i H haj. Let C^'^i^^,) be the space of Lipschitz continuous functions 

on that is, 



CO'^(Af.) = {^:|M|o,= sup '^(f -^;"^^' <+oo 

I wjtw',w,w'GX^ \W -W \ 

in which ||(/5||o,i is the C^'^-norm of ip. Throughout the paper, we assume 
the following regularity conditions: 

(CI) The random variables Z^ji are bounded, uniformly in 1 < j < rui, 
1 <i <n, 1 <l < d2- The marginal density fiji{-) of Ziji is bounded away 
from and oo on [0, 1], uniformly in 1 < j < mj, 1 <i <n. The joint density 
fijlj'l'i', •) of {Ziji, Zijui) is bounded away from and oo on [0, 1], uniformly 
in 1 < z < n, 1 < j,j' < rrii, and 1 <l ^ I' < d2. 

(C2) The eigenvalues of the true correlation matrices Rj are bounded 
away from 0, uniformly in 1 <i <n. 
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(C3) The eigenvalues of the inverse of the working correlation matrices 
Yii{a)~^ are bounded away from 0, uniformly in l<i<n. 

(C4) Let riT = Yll=i'^i- There are constants < c < C < oo, such that 



CUT < Amin(Er= 



<A„.ax(Er=lX/XJ<C^^T. 



(C5) For 1 < / < d2, ^\zi) e C°'^[0,1], for given integer p>l. The 



spline degree satisfies q - 
knots Nn — )■ oo, as ?iT - 



'I 

-l>p, and fi'{r]) G C" 
■ oo. 



0,1 



(Xri). The number of interior 



Conditions (C1)-(C4) are similar to conditions (Al)-(A4) in [21], and 
condition (C5) is weaker than the first part of condition (A5) in [21]. Let 
/3o be the true parameter vector and ^;o(') be the true Ith additive function 
in model (1). According to the result on page 149 of [3], for 9iq{-) satisfying 
condition (C5), there is a function 



Jn 



(8) 



such that sup^^g[o^i] \8ioizi) - 
s = l,..., J„,/ = l,...,d2)'^, 



Oio{zi) \ = 0{Jn^). Thus, by letting 7o = (7, 



sup 

zG[0,l]'*2 



B(z)T7o 



d2 

E 

1=1 



d2 



sLO '■ 



sup \euzi)-eio{zi)\=o{d2j-n- 



In addition to the regularity conditions above, we need extra conditions 
to ensure the existence and weak consistency of the estimators in (4). Let 
= mini<,<„An,i„{R£i(a)}, A^" = maxi<,<„ A„,ax{Rr'(")}' C = 
maxi<i<„,{Amax(R.j~^(a)R.i)} and r™"" = mini<i<„{Ai, 
additional conditions are as follows: 

jl/2 



(Rr^(a)Ri)}. The 



(Al) (Ar7C^-)nT/Jn' ^00. 
(A2) There is a constant cq > 0, for any r > 0, such that P{Pn(/3,7) > 
co*n(/3o'7o) and P„(/3,7) is nonsingular, for ah (/3'^,7'^)'^ G Cn{r)} 1, 



{(/3 



^n(/3o,7o)r/'((/3-/3o)^ (7-7o)MMl < 



where (r) 

Conditions (Al) and (A2) are used to ensure the existence and weak 
consistency of the solutions in (4). Condition (A2) corresponds to condi- 
tion (L^) in [33] for generalized linear models. Conditions (Al) and (C4) 
imply condition (I^) in [33], which will be proved in the Appendix. Condi- 
tion (A2) relates to the true and the working correlation structures Rj and 
Ri(a). Since the true correlations Rj are often not completely specified and 
maxi<j<„ Amax(Ri) ^ '^(n)) then condition (Al) is implied by 

(Al*) (Ar7Ar")m-inT/Jn/'^oo. 
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Condition (Al*) does not contain Rj. Thus, the order requirements of 
n, and Jn depend on the choice of the working correlations Rj(a). 

For instance, if the working correlation structures are independent or AR(1) 
within each subject, then there exist constants < c/j < Cr < oo, such that 
Ci? < (A^^^)-U5^*° < Cij. Thus, condition (Al*) is equivalent to m^J^nx/ 

1/2 

Jn — )• OO. For exchangeable working correlation structures, there exist con- 
stants < C'j^ < oo, such that maxi<i<„ Amax{Rj(a)} < C'^'^(n)i then 
i^n^'^y^^T^ > c'^m^^y for some constant < < oo. Condition (Al*) 
is implied by niT^ni^: / Jn"^ — )• oo. 

Theorem 1. Under conditions (Al) and (A2) or (Al*) and (A2), as 
TiT — ?• oo, there exist sequences of random variables /3„ and 7„, such that 

PiSniPn^ln) = 0} ~^ 1; c^"'^ llAi ~ /^oll ~^ ^ ^'^'^ IIT" ~ 7oll " ^ m probabil- 
ity. 

Next we derive the asymptotic properties of /3„. Let X and Z be the collec- 
tions of all Xjjfc's and Zijis, respectively, that is, Af^rpxdi — (^^-f) • ■ ■ j^^n)""" 
and Z„^xd2 = (^i") • • • 1 Z^)"""- Let Aj be the diagonal matrix with the di- 
agonal elements being the first derivative of //(•) evaluated at X^/3q + 

YAti^ioi^iji)^ j = l,...,mj, and Vj = Aj^^Ri(Q;) A^^^ with Aj being the 
marginal variance of Yj evaluated at the true parameters and additive func- 
tions. To make j3 estimable, we need a condition to ensure X and Z not 
functionally related, which is similar to the condition given in [21]. De- 
fine the Hilbert space % = {V'(z) = Yl'i=i'^i(.^i)^ -^''Pii^i) — 0) llV'/lb < 00} 
of theoretically centered L2 additive functions on [0,1]"^^, where HV'zlb = 
{Jq tpf{zi) dzi}^^'^ . Let ip"^ be the function ip G T-L that minimizes 

EILi ^[{Xf ^ - V'(Z,)}TA,Vri A,{xf ) - ^{ZM where X^'^ = {Xak, ■ ■ • , 
Ximik)"^ ■, 1 < k < di. Some other assumptions needed are given as follows. 

(A3) Given 1< A;<di, ipl^'"''' gC°'1[0,1], for l<p<g + l. 

The order requirements of the number of interior knots N and A^*^ in 
steps I and II are given in the following assumption: 

(A4) (i) v^(lognT)A^'5/nT(C^^/Ar°)V2 = o(l), {N^)-P~^/^n^^{X^^''/ 

A--)(ArvC)'/' = o(i), and (ii) (Arv^r")'/'(ArvAr'^)'(nT/ 

Ar5)l/2^r-P = 0(1), (A^;f^^/r™'^)V2(^max/^min)2QQg^^/^5)l/2 ^ q^^-)^ 

{Nf,NnXlognTy/^n~^ = o{l). 

Since A™'"^ < < r^^^ < m(„)A™'^^, condition (A4) is implied by a 
stronger condition as below: 
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(A4*) (i) V^(lognT)iV'5/j^^j„J/2(^max/^mm)l/2 ^ ^^^^ ^ ^ 

"-T^^rVAr'')^/^ = 0(l), and (ii) (Ar''/Ar'')^^^(^T/iV'^)^/^iV~^ = o(l), 
(ArVAr'')^/^(lognT/iV^)^/2 = o(l), {N^NnlognTy/^n^^ = o{l). 

Condition (A3) is weaker than the second part of condition (A5) in [21]. 
Condition (A4*) does not depend on the true correlation matrices Rj, which 
are not specified. It is clear that the first conditions in (A4) and (A4*) ensure 
conditions (Al) and (Al*), respectively. 

Remark 1. (A4)(i) lists the order requirements for to obtain the 
asymptotic results of the oracle estimator in Theorem 3. (A4)(ii) ensures 
the uniform oracle efficiency of the two-step spline estimator. It will be 
shown in Theorem 4 that the difference between the two-step spline and the 
oracle estimators is of uniform order Op{(A™'^^/A™™)^(J;7'^ -|- y^log nT/nx)} 

with Op{ (A^^^/Ar'') Vlog /tit} and Op{(Ar^/A™°)2 J^T^} caused by 
the noise and bias terms, respectively, in the first step spline estimation. 
The inverse of the asymptotic standard deviation of the oracle estimator 
is of order 0{ y^riT /J^ ( A^^^/r™") ^2 } . The first two conditions of (A4)(ii) 
ensure that the difference is asymptotically uniformly negligible. If we let 
A'^ have the order n^^'^^\ then the difference is of uniform order Op{{X^^^ / 
A™™)^ YTognr/nr}. Therefore, an undersmoothing procedure is applied in 
the first step to reduce the bias. When A™"^, A™'*^, r™™ and r™'^^ are finite 

numbers, (A4)(i) becomes -y/ (lognx)A^'^/nT = o(l) and {N^)~P~^/'^n}^'^ = 
0(1). The optimal order of is 'n}^^'^^^^\ Define 

Xjfc = Xj-'^'' — V'fclZj), l<k<di, Xj = (Xji, . . . ,XjrfJ.m^x(ii- 

Define = A^Vr^ Ai%, = ^Hi ^^^Vris.Vri A,X, and 

(9) En = {E{^n)r'E{^„){E{^n)r^. 

The following result gives the asymptotic distribution and consistency 
rate of /3„ for general working covariance matrices. 

Theorem 2. Under conditions (A2)-(A4), as nx — )• oo, H„ ' — 
/3o) ^ Normal(0,I,J, and \\X " PoW = Op{n-'/'(C-)i/2(A--)-V2}. jf 
condition (A4) is replaced by (A4*), then 

0n -M= o,{n-^/^m;/j(Ar^)^/2(Ar)"'/'}- 

Remark 2. It is easy to show that the covariance H.„ in (9) is min- 
imized when the working covariance matrices are equal to the true co- 
variance matrices such that Vj = Sj for all 1 < i < n, and in this case 
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equal to To construct the confidence sets for /3, H„ is con- 

sistently estimated by H„ = where = ELi^i AiVr^A^Xi, 

$n = Er=iti'A,Vris,VriA,X,, and X, = X, - Proj^* X„ ^ = 1, • ■ • ,n, 
in which Proj^* is the projection onto the empirically centered spline inner 

product space and 5]j is a consistent estimator of Xlj. 

For I <l <d2, let = (7s/,o)s=i) with 75(^0 defined in the same fashion 
as given in (8), and 6_iq = {9i'o{-), 1 < ^' < ^2, ^' / 0- Define 

K,ihf,o) = I X](Bfj^A,(/3o,0„io,7fo)Vr'(/9o,e-zo,7fo) 

U=i 



X A,(/3o,6>„io,7Lo)B 



,5 ^T35 

i-i 



In order to ensure the existence and uniformly weak convergence of the 
oracle estimator 9^ i{zi, ^qjO^iq), we need the following conditions: 

(A5) For 1 < / < there is a constant q > 0, for any r > 0, such that 
P{K,lhf) > Ci^:,;(7fo) and V*^^ihf) is nonsingular, for all -ff G Cn(r)} ^ 
1, where Ur) = iff : ll{*;^(7fo)}'/'(7f " 7fo)ll < (rr^)'/^}. 

For 1 < / < d2, define = ,)}-ii?(<I>; where 



.-Dj-/; ^i^i ^-jr"^ 



Theorems. lei 6*^0(2;) = £'{^f^;(zi,/3o,^-w)|'^,^}- Under conditions 
(A3), (A4)(i) and (A5), for 1 < I < d2 and zi € [0, 1], as n^-^ 00, 

(Bf (zO^H;,Bf (z,))"^/^{?f/z,,/3o,0_/o) - ^ro(^0} N{0, 1), 
sup |^f/z^/3o,0-w)-ero(^OI=Op{V(log^T)J;?/nT(rr7Ar)'/'}, 

sup |0ro(^o - = op{(Ar7Ar 

2;G[0,1] 

anrf i/iere are constants < q = < Q^h < OO; sttc/i i/iai for all zi £ [0, 1], 



(11) 



{Bf ,Bf (z,)}^^' < Q,HA/'^n^/^T(rr7Ar) 
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Replacing (A4)(i) by (A4*)(i), 

Op{ ^(lognT)J^m(„)/nT(ArVAr") '^'} ■ 

Remark 3. Pointwise confidence intervals for Oiq{zi) can be constructed 
based on the results in Theorem 3. By (10) and (11), the bias term in (10) 
is asymptotically uniformly negligible through undersmoothing if 

(Ar'S)-p-l/2„V2^;S^max/^mm)^^max/^min)l/2 ^ ^(j)^ rj,^^^^ ^he form 

[(A}f^/A^;f''^)2(AJ^^^/T;f*'i)nT]^/(2P+^)iV*, where the sequence N* satisfies 
A^* — ;> oo and n^'^ N* — ;> for any r > 0. Under (A4*)(i), A^*^ is of the form 

[(A;f^^/A^;f'°)3nT]^/^^P+^^A^*. 

Theorem 3 presents asymptotic normality and uniform convergence rate 
of the oracle estimator 9^ ^{zi, Pq,6^iq). The oracle estimator achieves the 
convergence rate of univariate spline regression function estimation. Refer- 
ences [35] and [13] studied asymptotic normality of spline estimators for non- 
parametric regression functions with i.i.d. data. Reference [14] established 
the asymptotic distribution for the univariate spline estimator in partially 
linear models for clustered data with m(„) < oo. Reference [13] discussed the 
difficulty of obtaining asymptotic normality of spline estimators for additive 
models. Reference [21] studied convergence rate of the one-step additive 
spline estimator for clustered data with m(„) < oo, but it lacks the limit- 
ing distribution. The next theorem will present the uniform convergence 
rate of the two-step spline estimator 9^ ^{zi, P,On^-i) to the oracle estimator 
9^ i{zi, f3Q,0„iQ), and establish the asymptotic normality of 9^i{zi,f3,On^-i). 

Theorem 4. Under conditions (A2)-(A5), for l<l <d2, 

sup \9li{zi,p,dn,-l)-'^nM'P0'^-l0)\ 

(12) = Op{(Ar7Ar)'(v/lognT/nT + J"^)} 

= o,{{4/nT)'/\rfyxr''f'} 
and replacing (A4) by (A4*) , 

sup \9li{zi,p,0n,-l)-'^n,li^hP0^^-l0)\ 
2iG[0,l] 

= Op{{J^/nTf\xr'/XT''f'}. 
Hence, for 1 <l <d2 and zi G [0, 1], as nx — ?• oo, 

(Bf(z,)^H;,Bf(zO)"'/'{^f,/(^/,3,^n,-/)-ero(^0}^A^(0,l). 
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Remark 4. Similarly as in (9), H* ^ is minimized when Vj = Sj 
for all 1 <i <n, and in this case is equal to ;)}^^. To construct a 

pointwise confidence interval for 9io{zi) at zi G [0, 1], H* ^ is consistently esti- 
mated by % = K:'K,iK:^ where = Er=i(Bfi)^A,Vri A,Bf, and 
K,i = Er=i(Bf;)^AiVriSiV-^AiBf,. Then under the assumption given 
in Remark 3, for any a G (0, 1), an asymptotic 100(1 — a)% pointwise con- 
fidence interval for Oio{zi) is 

(13) eliizi,dX,-i)±z^/2iBfizifE*^^iBf{zi)f\ 

Remark 5. By letting N have order n!^^'^^\ the difference in (12) is of 
uniform order Op{ (A^'^^/A™™)^ y^log nx /'^t} • So undersmoothing is applied 
to reduce the approximation error caused by the bias in the first step. 

4. Simulation. In this section we conduct simulations to illustrate the 
finite-sample behavior of the proposed GEE estimators for both normal and 
binary responses. For each procedure, we consider three different working 
correlation structures: independence (IND), exchangeable (EX) and first or- 
der auto-correlation (AR(1)). For notation simplicity, denote the two-step 

spline estimator 9^i{zi,P^,dn-i) defined in (7) as 0^^i{zi) = Bf {zi)^j^'^i , 

and the oracle estimator e^i{zi, (3,0^1) in (5) as O^fi^i) = ^f(.ziV^nf- In 
the first step, the pilot estimators are obtained by an undersmoothed spline 
procedure to reduce bias. By the order requirements of the number of in- 
terior knots, we select a relatively large by letting N = [2n}^^'^^^], where 
[a] denotes the nearest integer to a. In the second step, A^'^ is selected from 
the interval Ij^s = [[an], [5a„]], a.„ = (nTlognT)^/^^^~'~^\ minimizing the BIG 
criterion 

(14) BIG(iV^) = \og{2Q;,/jli)/n} + 4log(7i)/n, 

where ^^^7^,^) = 2"^ E:=i(Y, - nf'^^HX,^n,-l,lil){Y, - with 
= M(Xi3n + Ef'Li,zYi^n,r(Ziz') + (Bf,)'^7f ;). The optimal number of 
interior knots is chosen as A^*^ = argmin^5£/^^BIG(A^'^). We use cu- 
bic B-splines {q = 3) to estimate the additive nonparametric functions. We 
generate nsim = 500 replications for each simulation study. 

Given 1 < I < d2, to compare the performance of the two-step estima- 
tor 0^'^{zi) with the pilot spline estimator 9n,i{zi) and the oracle estimator 

^nfi^i)^ we define the mean integrated squared error (MISE) for 
9i^{zi) as MISE(^ff) = X:^lTlSE(^f|j, where ISE(^f|j = 
^i'Er=iEr=i(^ga(^.j7,a) - OiiZiji^a)?, and is the estimator of 9i 

and Ziji^a is the observation of Ziji in the ath sample. The MISEs for 9n,iizi) 
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and e°f{zi) denoted as MISE(a„,/) and MISE(0^R) 

are defined in the same 

way. Tlie empirical relative efficiency for the two-step estimator in the ath 
sample is defined as eSi^a = {ISE(^f|^)/ISE(^Oj|'^)}V2. To construct con- 
fidence intervals for coefficient parameters (/3o,0) • ■ • )/3o,(di-i)) by using the 
first result in Theorem 2 and to construct pointwise confidence intervals for 
the Ith nonparametric function 6io{zi) given in (13), the true correlation 
matrix R is consistently estimated by 

n 
i=l 

x[Y,-/i{^^(3„,7n)}]^A-^/'(3„,,7n)- 

^ 1 /2 ^ 1/2 

And the covariance matrix S j is estimated by S j = A • RA^ . Let /3o = 

(/3o,fc)i=o ^^^11"^ (^n — (/^n,fc)i:=o ' evaluating estimation accuracy of each 
coefficient parameter, we report the root mean squared error (RMSE) de- 
fined as {Ea=T(^n,fc - /3o,fc)Vnsim}i/2, for < A: < (ii - 1, where is the 
estimate of (3^^^ obtained from the ath sample. 

Example 1 (Continuous response). The correlated normal responses are 
generated from the model Yij = + 9i{Ziji) + 6*2(^1^2) + ^3(^1^3) + 
where /3 = (1, -1,0.5), = (X^,- i,Xi,-2,Xi,-3)T, BiiZi) = sin(27rZ0, 1 < / < 
3. For the covariates, let Ziji = $(Z* ;), 1 < / < 3, with Z*j = {Z*-^,Z*-^, Z*-^)^ 
generated from the multivariate normal distribution with mean and an 
AR(1) covariance with marginal variance 1 and autocorrelation coefficient 
0.5, Xij^i = ±1/2 with probability 1/2, and (X^j- 2, Xjj- a)"^ ~ N[(0,0)'^, 

diag(a(Zjji), a(Zy2))] with a{z) = The error term = (e^, . . . , 

^imi)"^ is generated from the multivariate normal distribution with mean 0, 
marginal variance 1 and an exchangeable correlation matrix with parameter 
p = 0.5. We let n = 250 and cluster size mi = m = 20,50, 100, respectively. 
For computational simplicity, we choose the same cluster size for each sub- 
ject. The computational algorithm can be easily extended to the case with 
varying cluster sizes. Table 1 lists the empirical coverage rates of the 95% 
confidence intervals of the estimators (/3n,fc)fc=i for coefficients (/3o,fc)|=i5 the 
RMSE and the absolute value of the empirical bias denoted as Bias for IND, 
EX and AR(1) and m = 20, 50, 100. 

The empirical coverage rates are close to the nominal coverage probabili- 
ties 95% for all cases. The results are confirmative to Theorem 2. EX has the 
smallest RMSE, since it is the true correlation structure, which leads to the 
most efficient estimators (Remark 2). The RMSEs decrease as cluster size 
increases for all three working correlation structures. The last three columns 
show that the empirical biases are close to zero for all cases. 
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Table 1 

The empirical coverage rates of the 95% confidence intervals for (/3o_fe)|^]^, the RMSE 
and Bias for the IND, EX and AR(1) working correlation structures with m = 20, 50, 100 







Coverage frequency 




RMSE 






Bias 




m 




/3o,i 


/3o,2 


/3o,3 


/3o,i 


/3o,2 


/3o,3 


/3o,i 


/3o,2 


/3o,3 


20 


IND 
EX 
AR(1) 


0.948 
0.954 
0.936 


0.956 
0.950 
0.954 


0.950 
0.948 
0.956 


0.0279 
0.0196 
0.0260 


0.0137 
0.0098 
0.0123 


0.0137 
0.0108 
0.0121 


0.0050 
0.0018 
0.0026 


0.0002 
0.0000 
0.0003 


0.0008 
0.0006 
0.0011 


50 


IND 
EX 
AR(1) 


0.948 
0.946 
0.944 


0.952 
0.950 
0.956 


0.948 
0.948 
0.948 


0.0177 
0.0126 
0.0157 


0.0092 
0.0063 
0.0079 


0.0091 
0.0066 
0.0081 


0.0006 
0.0002 
0.0003 


0.0001 
0.0001 
0.0002 


0.0009 
0.0002 
0.0003 


100 


IND 
EX 
AR(1) 


0.948 
0.950 
0.946 


0.956 
0.954 
0.954 


0.958 
0.948 
0.956 


0.0126 
0.0084 
0.0111 


0.0063 
0.0044 
0.0056 


0.0064 
0.0045 
0.0055 


0.0001 
0.0001 
0.0001 


0.0003 
0.0002 
0.0004 


0.0002 
0.0001 
0.0001 



Table 2 shows the MISE(xlO~^) for the two-step sphne estimator 6*^1 (•), 

the pilot estimator 9n,i{-) and the oracle estimator ^^f'(-)) ^ = 1)2,3, for 

IND, EX and AR(1) structures and cluster size m = 20,50, 100. and 

^nf (■) have similar MISE values, while On,i{-) has the largest MISE value. 
The EX structure has the smallest MISEs, and the MISEs decrease as the 
cluster size increases. 

We plotted the kernel density estimates in Figure 1 of 500 empirical effi- 
ciencies eff^^a for the estimators of the first function 9i{-) for IND (dashed 
lines), EX (thick lines) and AR(1) (thin lines) structures with m = 20,50 



Table 2 

The MISE{xlO-^) for ef_f (■), 0„^i{-) and e^f{-), I = 1,2,3, for the IND, EX and AR(1) 
working correlation structures with m = 20, 50, 100 



m 




aSS 
t'n,! 


0n,l 


30R 


Sss 

t>n,2 






qSS 




20R 


20 


IND 

EX 
AR(1) 


1.678 
0.883 
1.249 


2.231 
1.228 
1.710 


1.588 
0.836 
1.186 


1.659 
0.943 
1.324 


2.278 
1.232 
1.790 


1.517 
0.848 
1.205 


1.516 
0.849 
1.252 


2.118 
1.167 
1.713 


1.448 
0.811 
1.182 


50 


IND 

EX 
AR(1) 


0.633 
0.342 
0.473 


0.862 
0.463 
0.664 


0.601 
0.328 
0.459 


0.677 
0.348 
0.513 


0.927 
0.475 
0.690 


0.608 
0.321 
0.478 


0.631 
0.353 
0.486 


0.881 
0.465 
0.679 


0.601 
0.335 
0.464 


100 


IND 

EX 
AR(1) 


0.319 
0.173 
0.247 


0.440 
0.234 
0.333 


0.306 
0.166 
0.235 


0.346 
0.176 
0.252 


0.461 
0.237 
0.348 


0.317 
0.162 
0.230 


0.315 
0.172 
0.244 


0.436 
0.227 
0.338 


0.299 
0.164 
0.232 
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Fig. 1. Kernel density plots of the 500 empirical efficiencies of the two-step estimator to 
the oracle estimator of the first function 9i{-) for the IND (dashed lines), EX (thick lines) 
and AR(1) (thin lines) working correlation structures with m = 20,50. 

and n = 250. The vertical line at efficiency = 1 is the standard line for the 
comparison of the two-step estimator (7) and the oracle estimator (5). The 
centers of density distributions are close to 1 for all working correlation 
structures, and EX has the narrowest distribution. 

Example 2 (Binary response). The correlated binary responses {^j} 
are generated from a marginal logit model 

logit P{Yij = 1|X,„ Zi,) = + OiiZiji) + e2{z,,2), 

where /3 = (0.5, -0.3, 0.3), X^j = (1, X^j- 1, X^j- a)"^, 6*1(^1) = 0.5 x sin(27rZi), 
and 02(22) = —0.5 x {Z2 — 0.5 + sin(27rZ2)}. For the covariates, we generate 
Xijk and Ziji independently from standard normal and uniform distribu- 
tions, respectively, such that Xijk ~ N(0, 1) and Ziji ~ Uniform[0, 1]. We use 
the R package "mvtBinaryEP" to generate the correlated binary responses 
with exchangeable correlation structure with a correlation parameter of 0.1 
within each cluster. We let the number of clusters be n = 100,200,500, re- 
spectively, and let the cluster size be equal and increase with n, such that 
"^(n) =nT'i = [2n^/^J , for 1 <i <n, where [aj denotes the largest integer no 
greater than a. So m = 20, 28, 44 for n = 100, 200, 500, respectively. Table 3 
shows the empirical coverage rates of the 95% confidence intervals of the 
estimators {f3n,k)k=o ^'^^ coefficients {Po,k)1=o and the RMSEs for IND, 
EX and AR(1) and n = 100, 200, 500. Table 4 shows that the empirical cov- 
erage rates are close to the nominal coverage probabilities 95% for all cases. 
EX has the smallest RMSE values, and the RMSEs decrease as n^increases. 
Table 4 shows the MISE for the two-step spline estimator 0^'f{-), the 

pilot estimator 6n,i{') and the oracle estimator 0^f'{-), 1 = 1,2, for the IND, 
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Table 3 

The empirical coverage rates of the 95% confidence intervals for (/3o,fe)fe=o 
estimated MSE for the IND, EX and AR(1) working correlation structures with 

n= 100,200,500 



Coverage frequency RMSE 











/3o,o 


/3o,i 


/3o,2 


/3o,o 


/3o,i 


/3o,2 


n 


= 100, m = 


20 


IND 


0.960 


0.946 


0.940 


0.0821 


0.0549 


0.0506 








EX 


0.940 


0.946 


0.946 


0.0763 


0.0469 


0.0454 








AR(1) 


0.966 


0.930 


0.940 


0.0773 


0.0540 


0.0488 


n 


= 200, m = 


28 


IND 


0.944 


0.946 


0.940 


0.0559 


0.0299 


0.0328 








EX 


0.948 


0.952 


0.942 


0.0554 


0.0289 


0.0310 








AR(1) 


0.940 


0.950 


0.940 


0.0556 


0.0291 


0.0325 


n 


= 500, m = 


44 


IND 


0.952 


0.946 


0.942 


0.0340 


0.0157 


0.0154 








EX 


0.948 


0.952 


0.946 


0.0336 


0.0136 


0.0142 








AR(1) 


0.952 


0.952 


0.942 


0.0340 


0.0153 


0.0153 



EX and AR(1) structures and n = 100, 200, 500. The MISE values for f (•) 

and Onf'i') close and 9n,i{-) has the largest MISE values. EX has the 
smallest MISEs among the three working correlation structures, and the 
MISEs decrease as n increases. 

For visualization of the actual function estimates, in Figure 2 we plotted 
the oracle estimator given in (5) (dashed curve), the two-step estimator given 
in (7) (thick curve) and the 95% pointwise confidence intervals constructed 
in (13) (upper and lower curves) of ^i(-) (thin curve) for n = 200 based on 
one simulated sample. The proposed two-step estimator seems satisfactory. 



Table 4 

The MISE for ^f.f (■), and e^f{-), 1 = 1,2, for the IND, EX and AR(1) working 

correlation structures with n = 100, 200, 500 



n 




Sss 




20R 
t>n,l 


Sss 

t>n,2 


dn,2 


SOR 
t>n,2 


100 


IND 


0.0172 


0.0243 


0.0174 


0.0158 


0.0222 


0.0159 




EX 


0.0148 


0.0223 


0.0148 


0.0139 


0.0204 


0.0137 




AR(1) 


0.0178 


0.0265 


0.0176 


0.0161 


0.0234 


0.0163 


200 


IND 


0.0059 


0.0086 


0.0059 


0.0056 


0.0082 


0.0056 




EX 


0.0048 


0.0069 


0.0048 


0.0054 


0.0075 


0.0053 




AR(1) 


0.0058 


0.0085 


0.0058 


0.0056 


0.0081 


0.0056 


500 


IND 


0.0015 


0.0022 


0.0015 


0.0015 


0.0021 


0.0015 




EX 


0.0013 


0.0019 


0.0013 


0.0014 


0.0019 


0.0013 




AR(1) 


0.0015 


0.0022 


0.0015 


0.0015 


0.0020 


0.0014 
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Fig. 2. Plots of oracle estimator (dashed curve), the two-step estimator (thick curve) 
and the 95% pointwise confidence intervals (upper and lower curves) of 9i{-) (thin curve) 
for n = 200. 

5. Application. In this section we apply tlie proposed estimation proce- 
dure to analyze unemployment-economic growth and employment relation- 
ship at the U.S. state level for the 1970-1986 period. Reference [2] has first 
studied the effect of economic growth on unemployment rate by establishing 
a parametric unemployment-growth model. They concluded that relatively 
high economic growth is more likely to show reduced unemployment rates 
when compared to states in which the economy is growing more slowly by 
obtaining a negative coefficient for growth. Reference [27] demonstrated a 
strong negative correlation between the change of unemployment rate and 
employment. We restudy their relationship by considering possible nonlin- 
ear relations of the unemployment rate with economic growth and time. The 
economic growth rate is calculated from the logarithm difference of the gross 
state product (GSP). The data for the unemployment rate, gross state prod- 
uct and employment are available for the U.S. 48 contiguous states over the 
period 1970-1986. Details on this data set can be found in [24]. The number 
of time periods for each state in estimation is m = 16, since the year 1970 is 
taken as the initial observation. We consider the following GAPLM: 

U^J =Po + hEij + Oi{T,j) + e2{Gij) + j = 2, . . . , 17, i = 1, . . . , 48, 

where Uij is the change in the unemployment rate for the jth year in the 
ith state, Eij is the empirically centered value of the relative change in 
employment, Gij is the GSP growth, and Tij is time, ^i(-) and 02(') are 
nonparametric functions of time and GSP growth, respectively. 

To test whether ^/(•), I = 1,2, has a specific parametric form, we construct 
simultaneous confidence bands according to Theorem 2 of [28] . For any a £ 
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JTable 5 

The estimated values Po and Pi of /3o and /3i and the 
standard errors SE{Po) arid SE(Pi) for the IND, EX and 
AR(1) working correlation structures 







SE(^o) 


/9i 


SE(^i) 


IND 


0.127 


0.0417 


-0.219 


0.0230 


EX 


0.127 


0.0494 


-0.249 


0.0220 


AR(1) 


0.127 


0.0484 


-0.250 


0.0216 



(0,1), an asymptotic 100(1 — a)% conservative confidence band for 9io{zi) 
over tlie domain of zi is given as 

eli{zi3A,-i) ± {21og(iV^ + 1) - 2loga}'/\Bf{zifEliBf{zi))'^^ 

with 6^ I obtained by linear splines with degree q = l. We use linear splines 
in both steps of estimation. 

We use three working correlation structures to analyze this data set, in- 
cluding the working independence Rj ( Of) — Imi where is an m x m iden- 
tity matrix, the exchangeable Rj(a) = a x 1^1^ , + (1 — a)Irn, where Im is 
the m-dimensional vector with I's, and the AR(1) Ri(a) = (-Rijj')j j'=i with 
Rijj' = a'-^"-^'! . The parameter a is estimated by the R package geepack from 
the first spline estimation step. We obtain the estimated values for a which 
are a = 0.088 for the EX structure and a = —0.199 for the AR(1) structure, 
respectively. Table 5 shows the estimated values f3o jxnd j3i of /3o and j3i and 
the corresponding standard errors SE(/3o) and SE(/3i) for the three working 
correlation structures. The estimation results are very similar for the three 
structures. The negative values of f3i imply a negative relationship between 
Uij and Eij, confirmative to the result in [27]. Both of the estimators are 
significant with p-values close to for the three different working correla- 
tion structures. The correlation coefficient r = 0.785, 0.822 and 0.762 for the 
IND, EX and AR(1) structures, respectively. 

Figure 3 displays the two-step spline estimators 0^^{-) (dashed lines) and 

2(") (dashed lines) of 9i{-) and 92{-) and the corresponding 95% pointwise 
confidence intervals (thin lines) and simultaneous confidence bands (thick 
lines) for the three working structures. Figure 3 shows that the change pat- 
terns of Uij with Tij and Gij are very similar for the three working structures. 
In the upper panel of Figure 3, we can observe a declining trend for 0^'i{-) 

in general. The values of ^f'i(-) were all positive before the year 1976, which 
means that the unemployment rate was increasing with time during that 
period. The increasing unemployment rate was caused by a severe economic 




-10 -5 5 10 15 -10 -5 5 10 15 -10 -5 5 10 15 CO 

GSP growth GSP growth GSP growth 



Fig. 3. Plots of the two-step spline estimated functions (dashed line), the 95% pointwise confidence intervals (thin lines) and the 95% 
confidence bands (thick lines) for Oi{-) (upper panel) and 92{-) (lower panel), and the GEE estimator of ^2(-) by assuming linearity 
(straight solid line). 
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recession that happened in the years 1973-1975. A local peak of O^'ii') is 
observed around 1980, when another recession happened. 

In order to test the linearity of the nonparametric function 62 , we plotted 
straight solid lines in the lower panel of Figure 3, which are the regression 
lines obtained by solving the GEE in (6) by assuming that ^i(-) is a linear 
function of GSP growth. All the three plots in the lower panel of Figure 3 
show that the confidence bands with 95% confidence level do not totally 
cover the straight regression lines, that is, the linearity of the component 
function for GSP growth is rejected at the significance level 0.05. The lower 
panel of Figure 3 indicates a general negative relation between the GSP 
growth and the change in unemployment rate. 

6. Discussion. In this paper we propose a two-step spline estimating 
equations procedure for generalized additive partially linear models with 
large cluster sizes. We develop asymptotic distributions and consistency 
properties for the two-step estimators of the additive functions and the one- 
step estimator of the parametric vector. We establish the oracle properties 
of the two-step estimators. Because the two-step estimator is a mixture of 
two different spline bases, and an infinite number of observations within 
clusters are correlated in complex ways, we encountered challenging tasks 
when developing the theories. We demonstrate our proposed method by two 
simulated examples and one real data example. Our proposed method can be 
extended to generalized additive models and generalized additive coefficient 
models, and it provides a useful tool for studying clustered data. The theoret- 
ical development in this paper helps us further investigate semi-parametric 
models with clustered data. In the real data example, we constructed confi- 
dence bands to test the linearity of the nonparametric function. To establish 
confidence bands with rigorous theoretical proofs will be our future work. 

In this paper we focus on the two-step spline estimation procedure, which 
is computationally expedient and theoretically reliable. As mentioned in Sec- 
tion 2, that kernel smoothing method can be applied to the second step. 
Let i^h(-) be a kernel weight function, where Kh{z) = h~^K{z/h) with 
bandwidth h. Let Gi{zi) = {\,zij^ . If we use local linear kernel estima- 
tion, then by assuming that (3 and 0„/ are known, 9i{-) is estimated by the 

oracle estimator df^{Zi) = Gi{Zi - ^^'^7?^ at any given point Zi, where 
T?^ — (710^' 7^1^)'^ with "yP^ solving the kernel estimating equations 

n 

^Ga(z/)^A,(/3,0_^,7P^)Vri(/3,0_;,7pi^)K,,(z,) 

i=l 

I \ i'=i,i'j^i J ) 
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where Kih{zi) = diag{Kh{Ziji - zi)} and Gii{zi) = {GiiZm - zi),..., 
Gi{Zim.i — zi)}"^ . Then 6i{zi) is estimated by 6'P^(2;;) =J^^- The two-step 
sphne backfitted kernel (SBK) estimator df^^(zi) is obtained by replacing /3 
and 0_i with the pilot estimators /3„ and 9n-i from step I. The asymptotic 
normality of the oracle estimator 6f^{zi) which is a pure local linear kernel 
estimator of 9i{zi) by GEE can be obtained following the same idea in the 
proofs for Theorem 3 and the results in [16] for kernel estimators using GEE. 
The uniform oracle efficiency of the SBK estimator 9f^^{zi) is achievable by 
following the same procedure as the proofs for Theorem 4 and by studying 
the properties of spline-kernel combination. See [20, 29] and [22] for the ora- 
cle properties of the SBK estimators in additive models, additive coefficient 
models and additive partially linear models with weekly-dependent data and 
a continuous response variable. The asymptotic distributions and the oracle 
properties of the SBK estimators for GAPLMs with large cluster sizes still 
need us to explore as future work. 

APPENDIX 

We denote by the same letters c, C, any positive constants without dis- 
tinction. For any s x s' matrix M, let ||M||oo = maxi<j<s ^^_^|Mjj|. For 
any vector a = (ai, . . . , a^)^, denote || a ||oo = niaxi<j<jj |ai| as the maxi- 
mum norm. Let Is be the s x s identity matrix. Let n„, n„ denote, re- 
spectively, the projection onto relative to the empirical and the the- 
oretical inner products. For any function 4>, define the empirical norm as 
\\4>\\nrj, = i^T^^7=iYlT=i4>i^iji^ij)'^- For positive numbers a„ and 6„, let 
ci-n ^ bn denote that lim„_j.oo dn/bn = c, where c is some nonzero constant. 

A.l. Proof of Theorem 1. It can be proved following the similar rea- 
soning as in [21] that under condition (Al) with nx — oo, J„ — ?• oo, and 
J„n~^ = o(l), there exist constants < c' < C < oo, such that with proba- 
bility 1, for nx sufficiently large, 

and 11X^^=1 XjBj II oo = Oa.s.{('T'T lognx)^/^}. By these results together with 
condition (C4), one has with probability 1, 

(15) c"nT < Xrrnn ^J^i^ ^ ^^ax ^^^i^ ^ '^"^T 

for some constants < c" < C" < oo. Then by condition (A2), 

(C-)-iA„,in{M/„(/3o,7o)} > cc"(T:r^)-^ArnT ^ oo. 



22 



S. MA 



Results in Theorem 1 can be proved similarly as Theorems 1 and 2 in [33] 
with r = \/2{di + d2Jn)/coe for any given e >0. 



A. 2. Proof of Theorem 2. By Taylor's expansion, one has 

(16) g„(3„,7„) - g„(/3o,7o) = -1^n{Pn,ln 



7n -7o 

where /3* = ti/3„ + (1 — ti)/3o; and 7* = t27n + (1 ~ ^2)70 foi' some ti,t2 G 
(0,1). Let n,(/3,7) = Ai(/3,7)V-H/3,7), for 1 < i < n. Then 

-DnifSn^n) = (/3o, 7o) + Hn.i (/S;, 7^) + H^.s (/S^, 7:) + H^.s + 0(nT J"^) , 
where Ur,,i{(3*^,Yn) = -E7=lmMP*n,lnh^, 

^nMnn) = X]DTri,(/3:,7:)A,(/3r,7r)D, (5 1 ^° 

n„,3 = *n(/3o'7o) - ^n(/3n.7n)) where ni(/3*,7,*) is the first order deriva- 
tive of nj(/3,7) evaluated at {(3^ , which is a rrii x nii x (di + d2Jn)- 
dimensional array, is between /3* and /3o, and 7** is between 7* and 79. 
By conditions (C3) and (C4) and (15), for any given vector a„ G 
with 1 1 On 1 1 = 1, there exists a constant < c < 00, such that with proba- 
bility approaching 1, aJ^„(/3Q,7Q)a„ > crixA™™. By Theorem 1 and (15), 
allVn,2{Pn,l*n)an = OpiKn- Since E{Tln,i{l3*^,il)\X , Z] = it can be 
proved by Bernstein's inequality of [1] a^n„^i(/3* ,7*)an = Op{(nTlognT)i/2}. 

By condition (CI), A™'^'^ = 0(C^^) = o(raTA^'" JrT^^^). Therefore, ^'„(/3o,7o) 
dominates nn,i(/3* ,7;!;) and nn,2(/3n,7n)' and by Theorem 1, ^'„(/3o,7o) 
dominates n„^3(/3* ,7*). Thus, from (16), one has 



(17) 



*n(/3o,7o)"'gn(/3o,7o){l + Op(l)}. 



,7n, -7o, 

Let Ajo = Aj(/3o,7o) and Vjo = Vj(/3Q,7o). To obtain the closed-form ex- 
pression of /9„ ~ f^Q, we need the following block form of the inverse of 
Er=iD7A,oVr„iA,oD,: 



/ n n \ 

^5^X?A,oVroiA,oX. 5]x?A,oVr„iA,oB 



-1 



1=1 



1=1 



(18) 



^BTA,oVrQiA,oX. J^BTA^oVr^iA^oB, 



i=l 



Hxx HxB 
Hbx Hbb 



i=l 
-1 /ttH 



H21 
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where H^^ = (Hxx — HxBHggHBx) ^, H^^ = (Hbb — HexH^xUxb) ^, 
=^-H^1HxbHb^, and H^^ = -H22HbxHxx- Consequently, 3„ - 
/5o = (/^n,e + 3n,/.){l + Op(l)}, m which 

( n n '^ 



. 1=1 



1=1 



11 



d2 



i=l 



1=1 



d2 



i=l 



1=1 



/^(XA + Mi7o) 



Lemma 1. Under condition (A4), there are constants < chi < Chi < 
oo, such that with probability approaching 1, for nx sufficiently large, 
CHi(Ar"nT)-^Idi < < CHAK^-rny^Id, with U^Hn (18). 

Proof. The proof of Lemma 1 follows the same fashion as the proof of 
Lemma A. 4 in [21], and is hence omitted. □ 

Lemma 2. Under conditions (A2) and (A4), = Op{{\T''/K''') x 

Proof.^ Let A/i(r,.) = ^(X,/3o + Efii^ra(ZiO) - /^(X^/^o + B,7o) = 
{A//(r7ij)}J!!i, then 



11 



E xT^,oVroi{A/i(r,^)} - HxeH^^ B^AioV-H A/i(r,^)} 



.i=l 

n 



1=1 



ExT^,oV-oM{A^(r7^)} - n„{A/.(^^)}] = nTH^W, 



i=l 



where W = (Wi , VF^j ), with 



5^(xf ))^A,oVr,i[{A^(r7^)} - n„{A^(r7^)}] 



i=l 



< CAr^n^i E E|X,,fc{A/i(7?,,)} - n„,{A^(r?,,)}|. 
i=i j=i 
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Following similar reasoning as in the proof of Lemma A. 5 in [21], it can 



be proved that n"^ Y2=x ET=i \Xijk{^fi{vij)} - Un{A^i{i],,)}\ = Op{Jn^^). 



Therefore, \Wk\ = Op{X^^^Jn )• By the above result and Lemma 1, one 
has = Op{(Ar")-^A-- Jn-^n. □ 

Lemma 3. Under conditions (A2)-(A4), as ut — >• oo, {Pne) — ^ 

N{Q,ldi), where H„ is defined in (9). 

Proof. Lemma 3 can be proved by using the Linder berg-Feller CLT 
and similar techniques for the proofs of Lemmas A. 6 and A. 7 in [21]. □ 

Lemma 4. Under conditions (A2) and (A4), there exist constants < 
cs < Ch < oo, such that 

and ||3„,ell =Op{nT'/2(C-)i/2(A-in)-i/2|_ 

Proof. For any vector a G R^^ with ||a|| = 1, one has 



a^E^a > X^A^oVr^iA^oX, j | > csn^'iKn'' r^''\ 

and the second result in Lemma 4 follows from Chebyshev's inequality. □ 

Proof of Theorem 2. By Lemmas 2 and 4, for any vector a G R''-^ 
with ||a|| = 1, one has 

Therefore, Theorem 2 follows from Lemma 3, the above result and Slutsky's 
theorem. □ 

A. 3. Proof of Theorem 3. Following the same reasoning as deriving (17), 
it can be proved that 

(19) 

= (7Lj+7L,;)(i + 0p(i)), 
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where 

7n,eJ = ln,e,liPo^^-lo) 



1=1 



n 
i=l 

^ I'^i ' 

By the decomposition in (19), 

0ro(^/) - QiM) = + Bf (zz)^7fo - 

X(l + 0p(l)). 

It can be proved by the Linderberg-Feller CLT that as nx — oo, 

{Bf{zifE:^,Bf{zi))-'/\Bf{zif^l,^,) N{0, 1). 
Fohowing similar reasoning as in the proofs in Lemma 5, it can be proved 
sup l7f,;.,.zl =Op{(Ar)"'Ar^(4)-^"^/2} 

and 

ll7t,/lloo = Op{(lognT/nT)^/'(rr^)'/'(Ar)"'/'}. 
By B-sphne properties, sup,,g[o,i] i^ifjn,t.,i\ = OpKXTVK'niJnm, 



and sup,,e[o,i] \^f{^iflie,i\ = OpWilogm)Ji/nTiTrVKn'^'}, so 



sup \Oloizi) - eio{zi)\ < sup |Bf (zz)'^7„_^ ;| 
2,G[o,i] zie[o,i] 

+ sup \Bf{zif-ri,o-Oio{zi)\ 

=op{(Ar)"'Ar^(4)-^}, 



SUP.,e[0,l]l<i(^/,/3o,^-w)-^ro(^OI=Op{V(log™T)J;5/^^(^max/^min)^ 
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A.4. Proof of Theorem 4. 

Lemma 5. Under conditions (A2)-(A4), 

Il7n - Toll = op{j'j'n-'/\rrvKn'^' + i^rvK^Jn'}, 

Win - Tolloo = Op{(lognT/nT)^/2(rr7Ar + i^TV ^T^Jn""'^'}- 

Proof. From (17) and (18), one obtains 7„ — 7o = (7n.e + 7n.,/x)(l + 
Op(l)), where 



. i=l j=l J 

,i=l I V «=1 / 

n f / '^^ \ 



~ _ tt22 
7n,M - -H 



i=l k \ 1=1 



-/x(Xi/3o + B,7o) 

It can be proved that there exist constants < < < co, such that 
with probabihty approaching 1, for nx sufficiently large, 

Letting n„^x be the projection on {Xi}"=i to the empirical inner product. 



7„,^ = J] BTA,oVroi[{A^(r,^^)} - n„,x{A^(r,^)}] = mU'^W, 

i=l 

where W = {Wi Wj„d2 ) , with 

n 

Ws,i = E(B?''^)^^^o^ro'[{A^(r?,^)} - UnM^Kv^ 

1=1 

B^''^ = [{Bs,i{Ziii), . . . , Bs^i{Zir,ni)}'^]. The Cauchy-Schwarz inequality im- 
plies 

n nii 

\Ws,i\ < CAr"nT'EEl^-''(^^J')i^^(^*:')} " 
i=i j=i 

<CAr^||B,,||„,||A/.-n„,x(AM)||„^=Op(ArV„"^'"i/2), 

thus, ii7n,Mii =op{(ArvAr")^"^}, ii7„„;.iioo =op{(ArvAr")^"''"'^'}- 

For any to G TZ-^^^i ^j^]^ ||^|| = x, it can be proved that Var(aj^7„ JX,Z) < 
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Op{nT^(C^^/A™'^)}, thus, t^^7„,e = Op{nT^/^(C^^/AJf^'^)i/2}_ Therefore, 
||7n,ell < «^n^V^7n,el =C'p{J;l/^n~^/^(C^^/Ar>^)i/2}, and by Bernstein's 
inequahty of [1] that ||7„,,el|oo = Op{(lognT/nT)^/2(C''''/^n''')^^^}- ^ 

Lemma 6. Under conditions (A2)-(A4), 

ii7ff - 7°jiioo = o,{(Ar7Ar)'(ykwo^+ (4)"'/'^-^) }• 

Proof. Let 0_/o = {9i'o{-),l' / /}, where 9i'o{-) is defined in (8). Let 
Jn-l = {ln,sV ■■l<s<Jn,l'^l)^ and 7„;o = {lsl',0 : 1 < s < J„, r / By 
t^he Taylor expansion, gf^;(7°,f ,3„,^n,-0-gf/7n?,3„,^-zo) = {5gf,;(7n,f, 
3„,0-«)/57'^/}(7n^-/-7-/o)> where = tj^^g + (1 - 1)7„ for tG (0,1). 
Let A, = A,(3„,0_,,7°f), V, = Vi(3„, 7°f ), £i = £i - Hn.xfe), 
A/.(^.) = A^(r,^) - n„,x{A^(r7j}, B,,-_, = {(B^, „/' ^ 
Bj = {(Bji „i, . . . ,Bi„^ Thus, by (6) and the proofs for 

Lemma 5, with probabihty approaching 1, there are constants < C1C2 < 
00 such that 

hil{lnfAX,^l)-sil{lnfA,0^l0)\\oo 
f n XT" 

^(Bf,)TA,VriB,„, ^BT„^A,oVrQi(I, + ^(ZZ.)) 

\i=\ / U=l 

<C2(Ar)^'(IICl||oo + ||C2||oo), 

where Ci = nTHEr=i(Bf,)^A,oV-iB,_a{Er=i B^_,A,oV-i(A;u(r,^))}, 
C2 = r.^HEr=i(Bf,)^A,oV-iB,_a(Er=iB^_,A,oV-ilJ, and then 

llCilloo < (Ar")2||C3||ooO(J„-0, where Cs = Ai + A2 + A3, Ai = {bxs)\h. 



A2 = (52s)s=i and A3 = {hs)s=\ with b^s = n^j. ^"^^ Ji^^j, 62s = n-p XlILi '^2s,i 

rrii d2 Jn 



and (53s = n^p^ ^^"^^ ^3 



i=i/'=i,/V's'=i 
52s,i = ^ ^ ^ ^ I -B^/ ) 1 1 Bs',i' [Zijv ) 1 1 (^i j'i' ) I , 

hs,i = ^^^^ ^|-B^/(^ij7)||-Bs',r(^ij7')||-Bs',r(^iY/') 

j = l i'jti j' I'^l s' = l 
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Let j = 5is^i — E{5is^i). It can be proved by B-spline properties that 
Ei6u,i)>^m,Jn/y/Ji, E{6lJ = 0, E{5l^.f ^ m,Jl + mpl{J^)-\ and 
E{\51^^^\^) < C7{?niJ^(J;f)'=/2-i + m24(J;f)'=/2"2} for A; > 3 and some con- 
stant 'C>0. Thus, E{\51J^) < {C'{J^Y'^Jnf~^k\E{5l^^^^i,^,) with C = 
(ji/{k-2) ^ By Bernstein's inequaUty in [1], 

> t I < 2exp<' 



i=l 



4Er=i^(5L.)^ + 2C'(j5)i/2j„t/- 



Let t = c{{nT + {Ya=i '^D JniJu)"^) lognx}^/^ for a large constant < 
c < oo. There is a constant < c' < oo such that E{6\g^)'^ < c'{miJ^ + 

mp^J^r'}. For J;f = 0((lognT)"in^/'mJ/J), one has P{\E7=l^ls,^\ > 

—c^ /(Ac') 

t) < 2nrp ' . By the Borel-Cantelli lemma, 

max =Oa...{n-'/V„(l + m(„)/4)^/2(lognT)^/2}. 

i<s<J;? 



Since E{6is) x Jn/y/Ji, one has ||Ai||oo = Oa.s.(-'n/\A^)- Since E{62s) x 
'n^^{^^^i tn^)/ -s/j^ and E{5^s) ^ i^i/ \/~J^^ similarly it can be proved that 
IIA2II00 = Oa.s.("i(n)/v^) and IIA3II00 = Oa.s.(^T/\/^)- Therefore, ||Ci||oo = 
Oa.s.{(A™^^)2nT(J;f)"^/V^^}. Following similar reasoning, by Bernstein's 
inequality one can prove IIC2II00 = Oa.s. {{^^^^)'^\/ '^T log ni/ J^). Thus, 

where = Cnin-rlogrn / J^)^/"^ and 6„ = Cnni{J^)~'^/'^ Jn^ with c„ = 
(A™™)~^(A™'^^)^. Following similar reasoning, one can prove that ||gf /(7^/^, 
3„,0_,o)-gf/7°?'/3>^-/o)||oo = Op(a„ + d„), where (i„ = Cnn^iJ^)-^/'^ 3'"^^ , 
l|gf/7n^/3,0-/o)-gf/7n^/9,^-Olloo = Op(6„),w 

0. Thus, ||gf^;(7n,f i3n>^n-z)||oo = C)p(a„ + By the Taylor expansion, 
there is t e (0, 1) such that 7„^; = t^^J + (1 - t)^l% 

dy^^i{jr^AX,-j)/djl, = K{l + Op{l)), with^A„ = Er=i(Bfj^A,Vri x 

AjBf^, Aj = Ai(/3„,0„ _/,7„ i) and Vj = Vj(/3„, 0„ 7„^;). There exist 
constants < C3 < C3 < oo, such that with probability 1, for nx sufficiently 
large, caA-'^nx < A„,in(A„) < A^ax(An) < CgAr^nx. By Theorem 13.4.3 
of [4], one has ||A-i|U = Oa.s.{(Ar"nT)-^}. Therefore, 

■lnf\L<\\{dsili7n,lAX,^l)/dlll}-'^ 



-55 -OR I 



□ 
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Proof of Theorem 4. By Lemma 6, 

sup \0^jizi,p^,dn-i) -e^/zi,Po,e_io)\ 

z,€[0,l] 

s=l 

= 0,{(Ar7Ar)'(VlognT/nT + Jn')}- 
By the above result and (11), 

sup |(Bf(zO''H;iBf(zO)"'/'{^f/^/,3n,0n,-O-^f/^Z,/3o,^-w)}|=Op(l). 

ziG[0,l] 

Thus, the asymptotic normality oi 9^ ^{zi, f3n,0n,-i) follows from Theorem 3, 
the above result and Slutsky's theorem. □ 
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